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Abstract — Nearest neighbors search is a fundamental problem in various research fields like machine learning, data mining and pattern 
recognition. Recently, hashing-based approaches, e.g., Locality Sensitive Hashing (LSH), are proved to be effective for scalable high 
dimensional nearest neighbors search. Many hashing algorithms found their theoretic root in random projection. Since these algorithms 
generate the hash tables (projections) randomly, a large number of hash tables (;.e., long codewords) are required in order to achieve 
both high precision and recall. To address this limitation, we propose a novel hashing algorithm called Density Sensitive Hashing 
(DSH) in this paper. DSH can be regarded as an extension of LSH. By exploring the geometric structure of the data, DSN avoids the 
purely random projections selection and uses those projective functions which best agree with the distribution of the data. Extensive 
experimental results on real-world data sets have shown that the proposed method achieves better performance compared to the 
state-of-the-art hashing approaches. 

Index Terms — Locality Sensitive Hashing, Random Projection, Clustering. 



o 



> 
o 
m 

o 



X 



1 Introduction 

Nearest Neighbors (NN) search is a fundamental prob- 
lem and has found applications in many data mining 
tasks [9], [11], [14]. A number of efficient algorithms, 
based on pre-built index structures (e.g. KD-tree [ ] and 
R-tree [2]), have been proposed for nearest neighbors 
search. Unfortimately, these approaches perform worse 
than a linear scan when the dimensionality of the space 
is high [ ]. 

Given the intrinsic difficulty of exact nearest neigh- 
bors search, many hashing algorithms are proposed for 
Approximate Nearest Neighbors (ANN) search [1], [8], 
[li ]. The key idea of these approaches is to generate 
binary codewords for high dimensional data points that 
preserve the similarity between them. Roughly, these 
hashing methods can be divided into two groups, the 
random projection based methods and the learning 
based methods. 

Many hashing algorithms are based on the random 
projection, which has been proved to be an effective 
method to preserve pairwise distances for data points. 
One of the most popular methods is Locality Sensitive 
Hashing(LSH) [ ], [ ], [10], [12]. Given a database with 
n samples, LSH makes no prior assumption about the 
data distribution and offers probabilistic guarantees of 
retrieving items within (1 + e) times the optimal simi- 
larity, with query times that are sub-linear with respect 
to n [22], [27]. However, according to the Jonson Lrn- 
denstrauss Theorem [17], LSH needs 0(lnn/e^) random 
projections to preserve the pairwise distances, where e 
is the relative error. Therefore, in order to increase the 
probability that similar objects are mapped to similar 
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hash codes, LSH needs to use many random vectors to 
generate the hash tables (a long codeword), leading to a 
large storage space and a high computational cost. 

Aiming at making full use of the structure of the data, 
many learning-based hashing algorithms [> ], [15], [16], 
[31], [32], [35], [38] are proposed. Most of these algo- 
rithms exploit the spectral properties of the data affinity 
{i.e., item-item similarity) matrix for binary coding. De- 
spite the success of these approaches for relatively small 
codes, they often fail to make significant improvement 
as the code length increases [ ] . 

In this paper, we propose a novel hashing algorithm 
called Density Sensitive Hashing (DSH) for effective high 
dimensional nearest neighbors search. Our algorithm can 
be regarded as an extension of LSH. Different from all 
the existing random projection based hashing methods, 
DSH tries to utilize the geometric structure of the data to 
guide the projections (hash tables) selection. Specifically, 
DSH uses fc-means to roughly partition the data set 
into k groups. Then for each pair of adjacent groups, 
DSH generates one projection vector which can well split 
the two corresponding groups. From all the generated 
projections, DSH select the final ones according to the 
maximum entropy principle, in order to maximize the 
information provided by each bit. Experimental results 
show the superior performance of the proposed Density 
Sensitive Hashing algorithm over the existing state-of- 
the-art approaches. 

The remainder of this paper is organized as follows. 
We introduce the background and review the related 
work in Section 2. Our Density Sensitive Hashing al- 
gorithm is presented in Section 3. Section 4 gives the 
experimental results that compared our algorithm with 
the state-of-the-art hashing methods on three real world 
large scale data sets. Conclusion remarks are provided 
in Section 5. 
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2 Background and Related Work 

The generic hashing problem is the following. Given 



data points X 



xi, ■ • • ,x„ 



pdxn 



, find L hash 



functions to map a data point x to a L-bits hash code 

H{x) =. [/ii(x),/i2(x),...,/iL(x)], 

where hi{x) e {0,1} is the l-th hash function. For the 
linear projection-based hashing, we have [ ] 



(1) 



where w; is the projection vector and t; is the intercept. 
Different hashing algorithms aim at finding different F, 
wi and ti with respect to different objective functions. 

One of the most popular hashing algorithms is Lo- 
cality Sensitive Hashing (LSH) [1], [8], [10], [12]. LSH is 
fundamentally based on the random projection and uses 
randomly generated W(. The F in LSH is an identity 
function and ti = Q for mean thresholding^. Thus, for 
LSH, we have 



1 if wf X > 
otherwise 



(2) 



where w; is a vector generated from a zero-mean mul- 
tivariate Gaussian A/^(0, 1) of the same dimension as 
the input x. From the geometric point of view, the w; 
defines a hyperplane. The points on different sides of 
the hyperplane have the opposite labels. Using this hash 
function, two points' hash bits match with the proba- 
bility proportional to the cosine of the angle between 
them [8]. Specifically, for any two points Xi,Xj £ M'', we 
have [22]: 



Pr[/i;(xj) = hi{xj)] = 1 cos ^(r 



T 
X, X, 



(3) 



Based on this nice property, LSH have the probabilistic 
guarantees of retrieving items within (1 + e) times the 
optimal similarity, with query times that are sub-linear 
with respect to n [22], [27]. 

Empirical studies [1] showed that the LSH is signifi- 
cantly more efficient than the methods based on hierar- 
chical tree decomposition. It has been successfully used 
in various applications in data mining [9], [14], computer 
vision [32], [34] and database [20], [21]. There are many 
extensions for LSH [18], [22], [25], [28]. Entropy based 
LSH [ : ] and Multi-Probe LSH [25], [18] are proposed 
to reduce the space requirement in LSH but need much 
longer time to deal with the query. The original LSH 
methods cannot apply for high-dimensional kernelized 
data when the underlying feature embedding for the 
kernel is unknown. To address this limitation, Kernelized 
Locality Sensitive Hashing is introduced in p^"]. It sug- 
gests to approximate a normal distribution in the kernel 
space using only kernel comparisons [19]. In addition, 
the Shift Invariant Kernels Hashing [30], which is a 
distribution-free method based on the random features 

1. Without loss of generality, we assume that aU the data points are 
centralized to have zero mean. 



mapping for shift-invariant kernels, is also proposed re- 
cently. This method has theoretical convergence guaran- 
tees and performs well for relatively large code sizes [ ] . 
All these methods are fundamentally based on the ran- 
dom projection. According to the Jonson Lindenstrauss 
Theorem [ ], 0(lnn/e^) projective vectors are needed 
to preserve the pairwise distances of a database with 
size n for the random projection, where e is the relative 
error. Therefore, in order to increase the probability that 
similar objects are mapped to similar hash codes, these 
random projection based hashing methods need to use 
many random vectors to generate the hash tables (a long 
codeword), leading to a large storage space and a high 
computational cost. 

To address the above limitation, many learning-based 
hashing methods [3], [6], [13], [15], [16], [19], [23], [24], 
[2b], [27], [29], [31], [32], [35], p^^^], p^^], [38] are proposed. 
PC A Hashing [34] might be the simplest one. It chooses 
w/ in Eq.(l) to be the principal directions of data. Many 
other algorithms ], [ ], [^5], [38] exploit the spectral 
properties of the data affinity (i.e., item-item similarity) 
matrix for binary coding. The spectral analysis of the 
data affinity matrix is usually time consuming [ ]. To 
avoid the high computational cost, Weiss et al. \ \ made 
a strong assumption that data is uniformly distributed 
and proposed a Spectral Hashing method (SpH). The 
assumption in SpH leads to a simple analytical eigen- 
f unction solution of 1-D Laplacians, but the geometric 
structure of the original data is almost ignored, leading 
to a suboptimal performance. Anchor Graph Hashing 
(AGH) [24] is a recently proposed method to overcome 
this shortcoming. AGH generates k anchor points from 
the data and represents all the data points by sparse 
linear combinations of the anchors. In this way, the 
spectral analysis of the data affinity can be efficiently 
performed. Some other learning based hashing methods 
include Semantic Hashing [ i] which uses the stacked 
Restricted Boltzmann Machine (REM) to generate the 
compact binary codes; Semi-supervised Sequential Pro- 
jection Hashing (S3PH) [ ] which can incorporate su- 
pervision information. Despite the success of these learn- 
ing based hashing approaches for relatively small codes, 
they often fail to make significant improvement as the 
code length increases [19]. 

3 Density Sensitive Hashing 

In this section, we give the detailed description on our 
proposed Density Sensitive Hashing (DSH) which aims 
at overcoming the disadvantages of both random pro- 
jection based and learning based hashing approaches. 
To guarantee the performance will increase as the code 
length increases, DSH adopts the similar framework as 
LSH. Different from LSH which generates the projections 
randomly, DSH uses the geometric structure of the data 
to guide the selection of the projections. 

Figure 1 presents a toy example to illustrate the basic 
idea of our approach. There are four Gaussians in a 



3 




(a) Locality Sensitive Hashing [S] (b) PCA Hasliing [34] (c) Density Sensitive Hashing 



Fig. 1 . Illustration of different hashing methods on a toy data set. There are four Gaussians in a two dimensional plane 
and one is asked to encode the data using two-bits hash codes, (a) LSH generates the projections randomly and it is 
very possible that the data points from the same Gaussian will be encoded by different hash codes, (b) PGA Hashing 
uses the principle directions of the data as the projective vectors. In our example, all the four Gaussians are split and 
PCA Hashing generates an unsatisfactory coding, (c) Considering the geometric structure of the data (density of the 
data), our DSH generates perfect binary codes for this toy example. 



two dimensional plane and one is asked to encode the 
data using two-bits hash codes. LSH [ ] generates the 
projections randomly and it is very possible that the 
data points from the same Gaussian will be encoded 
by different hash codes. PCA Hashing [ ] uses the 
principle directions of the data as the projective vec- 
tors. In our example, all the four Gaussians are split 
and PCA Hashing generates an unsatisfactory coding. 
Considering the geometric structure of the data (density 
of the data), our DSH generates perfect binary codes for 
this toy example. The detailed procedure of DSH will be 
provided in the following subsections. 



3.1 Minimum Distortion Quantization 

The first step of DSH is quantization of the data. Re- 
cently, Pauleve et al. [ ] show that a quantized prepro- 
cess for the data points can significantly improve the 
performance of the nearest neighbors search. Motivated 
by this result, we use the fc-means algorithm, one of the 
most popular quantization approaches, to partition the 
n points into k (fc < n) groups. 

Let S = {Si , • • • , iSfc } denote a given quantization 
result. The distortion, also known as the Sum of Squared 
Error (SSE), can be used to measure the quality of the 
given quantization: 



By noticing 



p=l x(E<Sp 



(4) 



dSSE d V— -\ V — T ,, ,,9 

p=i xeSp 



we have: 



E ^' * " l,2,...,fc 



(5) 



The jip is the representative point of the p-th group Sp. 



It indicates that in order to minimize the distortion, we 
can choose the center point as the representative point 
for each group. 

There are two points that needed to be highlighted for 
the /c-means quantization in our approach: 

1) In large scale applications, it can be time consum- 
ing to wait the fc-means converges. Naturally, we 
can stop the fc-means after p iterations, where p is 
a parameter. We foimd that a small number of p is 
usually enough (usually 5). This will be discussed 
in the our experiments. 

2) In real applications, we do not know which is the 
best group number fc. It seems that the bigger the 
fc, the better performance we will get. It is simply 
because the quantization will have smaller error 
with a large number of groups. However, a large 
number of groups could lead to high computa- 
tional cost in the quantization step. As will be 
described in the next subsection, the number of 
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groups decides the maximum code length DSH can 
generate. Thus, we set 

k = aL (6) 

where L is the code length and a is a parameter. 

3.2 Density Sensitive Projections Generation 

Now we have the quantization result denoted by k 
groups Si, - ■ ■ ,Sk and the i-th group has the center jii. 
Instead of generating projections randomly as LSH does, 
our DSH tries to use this quantization result to guide the 
projections generating process. 

We define the r-nearest neighbors matrix W of the 
groups as follows: 

Definition 1: r-Nearest Neighbors Matrix W of the 
groups. 



1, if fit e Nriflj) or fij e Nrifli) 

0, otherwise 



(7) 



where iVr(Mi) denotes the set of r nearest neighbors of 

fJ-f 

With this definition, we can then define r-adjacent 
groups: 

Definition 2: r-Adjacent Groups: Group Si and group 
Sj are called r-adjacent groups if and only if Wy = 1. 
Instead of picking a random projection, it is more natural 
to pick those projections which can well separate two 
adjacent groups. 

For each pair of adjacent groups Si and Sj, DSH uses 
the median plane between the centers of adjacent groups 
as the hyperplane to separate points. The median plane 
is defined as follows: 



(x 



-) {jit - fij) = 



(8) 



One can easily verify that the hash function associated 
with this plane is defined as follows: 

1 if w^x > t 
otherwise 



Mx) = 



(9) 



where 



w = Aii-M2, i = - M2) (10) 



3.3 Entropy Based Projections Selection 

Given k groups, the previous step can generate around 
ifcr projections. Since k = aL, our DSH generates ^arL 
projections so far. Each projection will lead to one bit 
in the code and the usual setting of the parameters a, r 
will make ^arL > L. Thus, our DSH needs a projections 
selection step which aims at selecting L projections from 
the candidate set containing ^arL projections. 

From the information theoretic point of view, a "good" 
binary codes should maximize the information/ entropy 
provided by each bit [ ]. Using maximum entropy prin- 
ciple, a binary bit that gives balanced partitioning of the 
data points provides maximum information [ ]. Thus, 



we compute the entropy of each candidate projection 
and select the projections which can split the data most 
equally 

Assume we have m candidate projections 
wi, W2, w„i. For each projection, the data points 
are separated into two sets and labeled with opposite 
bit. We denote these two partitions as Tio and Tn, 
respectively. The entropy Si with respect to the 
projection w; can be computed as: 



-P,0 log PtO - Pil log Pil 



where: 



P, 



\%o\ 



Pa 



ITIil 



iT^ol + lT^i 



(11) 



(12) 



In practice, the database can be very large and comput- 
ing the entropy of each projection with respect to the 
entire database is time consuming. Thus, we estimate the 
entropy simply by using the group centers. For group 
center fii, we assign a weight i^i based on the size of the 
group. 

= — j: (13) 

We denote the two sets of group centers as Cio and Cn. 
Then Pio and Pn can be computed as: 



P 



iO 



J2 

seCio 



P^l 



tec.i 



(14) 



This simplification significantly reduces the time cost on 
the entropy calculation. 

After obtaining the entry 6i for each w^, we sort them 
in descending order and use the top L projections for 
creating the L-bit binary codes, according to Eq.(9). The 
overall procedure of our DSH algorithm is summarized 
in Alg. 1. 

3.4 Computational Complexity Analysis 

Given n data points with the dimensionality d, the 
computational complexity of DSH in the training stage 
is as follows: 

1) 0{aLpnd): fc-means with p iterations to generate 
aL groups (Step 1 in Alg. 1). 

2) 0{{aL)'^ld+r)): Find all the r-adjacent groups (Step 
2 in Alg. 1). 

3) 0{aLrd): For each pair of adjacent groups, generate 
the projection and the intercept (Step 3 in Alg. 1). 

4) Compute the entropy for all the candidate projec- 
tions needs 0{{aL)'^dr) (Step 4 in Alg. 1). 

5) The top L projections can be found within 
0{aLrlog{aLr)). The binary codes for data points 
can be obtained in 0{Lnd) (Step 5 in Alg. 1). 

Considering aLr ^ n, the overall computational com- 
plexity of DSH training is dominated by the fc-means 
clustering step which is 0{aLpnd). It is clear that DSH 
scales linearly with respect to the number of samples in 
the database. 
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Algorithm 1 Density Sensitive Hashing 
Input: 

n training samples Xi, X2, . . . , x„ G M''; 

L: the number of bits for hashing codes; 

a: the parameter controlling the groups number; 

p: the number of iterations in the fc-means; 

r: the parameter for r-adjacent groups 
1: Use fc-means with p iterations to generate aL groups, 

with centers /ii, ■ • • , ^aL- 
2: Generate the list of all ?'-adjacent groups based on 

the definition (1) and (2). 
3: For each pair of adjacent groups, use Eq.(lO) to 

generate the projection w and intercept t. 
4: Calculate the entropy of all the candidate projections 

using the weighted center points based on Eq.(ll) 

and Eq.(14) 

5: Sort the entropy values in descending order and use 
the top L projections to create binary codes according 
to Eq.(9). 
Output: 

The model: {w,;,ti}^j 

Binary hashing codes for the training samples: Y G 

|0 ijnxL 



In the testing stage, given a query point, DSH needs 
0{Ld) to compress the query point into a binary code, 
which is the same as the complexity of Locality Sensitive 
Hashing. 

4 Experiment 

In this section, we evaluate our DSH algorithm on 
the high dimensional nearest neighbor search problem. 
Three large scale real-world data sets are used in our 
experiments. 

• GISTIM: It contains one million GIST features and 
each feature is represented by a 960-dim vector. This 
data set is publicly available^. 

2. http://corpus-texmex.irisa.fr 



• FlickrlM: We collect one million images from the 
Flickr and use a feature extraction code^ to extract 
a GIST feature for each image. Each image is repre- 
sented by a 512-dim GIST feature vector. This data 
set is publicly available*. 

• SIFTIM: It contains one million SIFT features and 
each feature is represented by a 128-dim vector. This 
data set is publicly available''. 

For each data set, we randomly select Ik data points 
as the queries and use the remaining to form the gallery 
database. We use the same criterion as in [33], [36], that 
a returned point is considered to be a true neighbor if 
it lies in the top 2 percentile points closest (measured 
by the Euclidian distance in the original space) to the 
query. For each query, all the data points in the database 
are ranked according to their Hamming distances to the 
query. We evaluate the retrieval results by the Mean 
Average Precision (MAP) and the precision-recall curve 
[33]. In addition, we also report the training time and 
the testing time (the average time used for each query) 
for all the methods. 

4.1 Compared Algorithms 

Seven state-of-the-art hashing algorithms for high di- 
mensional nearest neighbors search are compared as 
follows: 

• Locality Sensitive Hashing (LSH) [. ], which is based 
on the random projection. The projective vectors 
are randomly sampling from a p-stable distribution 
{e.g., Gaussian). We implement the algorithm by 
ourselves and make it publicly available*'. 

• Kernelized Locality Sensitive Hashing (KLSH) [ - ], 
which generalizes the LSH method to the kernel 
space. We use the code provided by the authors''. 

3. http://www.vision.ee.etl-iz.ch/~zhuji/felib.html 

4. http://www.cad.zju.edu.cn/home/dengcai/Data/NNSData.html 

5. http://corpus-texmex.irisa.fr 

6. http://www.cad.zju.edu.cn/home/dengcai/Data/DSH.html 

7. http://www.cse.ohio-state.edu/ ~kulis/klsh/klsh.htm 
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(a) GISTIM (b) FlickrlM (c) SIFTIM 

Fig. 3. The precision-recall curves of all algorithms on three data sets for the codes of 48 bits and 96 bits. 



• Shift-Invariant Kernel Hashing (SIKH) [ ], which 
is a distribution-free method based on the random 
features mapping for approximating shift-invariant 
kernels. The code is also publicly available**. 

• Principle Component Analysis Hashing 
(PCAH) [34], which directly uses the top principal 
directions as the projective vectors to obtain the 
binary codes. The implementation of PCA is 
publicly available^. 

• Spectral Hashing (SpH) [ ], which is based on 
quantizing the values of analytical eigenfunctions 
computed along PCA directions of the data. We use 
the code provided by the authors^". 

• Anchor Graph Hashing (AGH) [24], which con- 
structs an anchor graph to speed up the spectral 
analysis procedure. AGH with two-layer is used in 
our comparison for its superior performance over 
AGH with one-layer [24]. We use the code provided 
by the authors^^ and the number of anchors is set to 
be 300 and the number of nearest neighbors is set 
to be 2 as suggested in [24]. 

• Density Sensitive Hashing (DSH), which is the 

8. http : / / WW w.unc . edu / ~yunchao / itq.htm 



method introduced in this paper. For the purpose 
of reproducibility, we also make the code publicly 
available^-. There are three parameters. We empiri- 
cally set p = 3 (the number of iterations in fc-means), 
a = 1.5 (controlling the groups number), r ~ 3 
(for r-adjacent groups). A detailed analysis on the 
parameter selection will be provided later. 
It is important to note that LSH, KLSH and SIKH are 
random projection based methods, while PCAH, SpH 
and AGH are learning based methods. Our DSH can be 
regarded as a combination of these two directions. 

4.2 Experimental Results 

Figure 2 shows the MAP curves of all the algorithms 
on the GISTIM, FlickrlM and SIFTIM data sets, respec- 
tively. We can see that the three random projection based 
methods (LSH, KLSH and SIKH) have a low MAP when 
the code length is short. As the code length increases, 
the performances of all the three methods consistently 
increases. On the other hand, the learning based methods 
(PCAH, SpH and AGH) have a high MAP when the code 
length is short. However, they fail to make significant 
improvements as the code length increases. Particulary, 



9. http://www.cad.zju.edu.cn/home/dengcai/Data/DimensionReductfehpfflrformance of PCAH decreases as the code length 

10. http://www.cs.huji.ac.il/~yweiss/SpectralHashing/ 

11. http://www.ee.columbia.edu/~wIiu/ 12. http://www.cad.zju.edu.cn/home/dengcai/Data/DSH.html 
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TABLE 1 

Training and testing time of all algorithms on GIST1 M. 





Training Time (s) 


Test Time (s) 


Method 


L = 16 


L = 32 


L = 64 


L = 96 


L = 16 




L = 32 


L = 64 




L = 96 




LSH [8] 


0.4 


1.0 


2.3 


2.6 


1.2 X 10" 


-B 


2.6 X 10-" 


5.8 X 10 


~a 


7.1 X 10" 


-B 


KLSH [22] 


27.4 


27.7 


27.9 


28.3 


30.0 X 10" 


-6 


32.3 X 10-*^ 


34.7 X 10 


-6 


36.5 X 10" 


-6 


SIKH [ : ] 


1.3 


2.5 


3.8 


5.2 


3.9 X 10- 


-6 


6.3 X 10-^ 


10.5 X 10 


-6 


15.9 X 10" 


-6 


PCAH [34] 


31.3 


57.2 


60.3 


75.0 


1.2 X 10" 


-6 


2.7 X 10-" 


5.6 X 10 


-6 


7.2 X 10" 


-B 


SpH [35] 


42.5 


77.8 


125.3 


239.8 


23.9 X 10" 


-6 


42.1 X 10-" 


93.4 X 10 


-6 


270.1 X 10" 


-6 


AGH [24] 


340.8 


344.7 


349.8 


356.0 


33.3 X 10- 


-6 


52.6 X 10-^ 


71.2 X 10 


-6 


191.3 X 10" 


-6 


DSH 


33.1 


45.9 


56.5 


63.6 


1.3 X 10" 


-B 


2.7 X 10-" 


5.8 X 10 


-B 


7.1 X 10" 


-B 



TABLE 2 

Training and testing time of all algorithms on FlickrIM. 





Training Time (s) 


Test Time (s) 


Method 


L = 16 


L = 32 


L = 64 


L = 96 


L = 16 




L = 32 


L = 64 




L = 96 




LSH [8] 


0.3 


0.8 


1.5 


1.8 


0.9 X 10" 


-B 


2.0 X 10-" 


2.8 X 10" 


-6 


4.6 X 10" 


-B 


KLSH [22] 


18.2 


18.5 


18.9 


19.4 


15.2 X 10" 


-6 


18.9 X 10-s 


22.8 X 10" 


-6 


25.1 X 10" 


-6 


SIKH [30] 


1.1 


2.0 


2.8 


4.3 


2.8 X 10" 


-6 


3.8 X lO-'' 


9.1 X 10" 


-6 


12.3 X 10" 


-6 


PCAH [ ] 


16.7 


29.4 


31.4 


33.7 


1.1 X 10" 


-B 


2.1 X 10-" 


2.9 X 10" 


-6 


4.9 X 10" 


-B 


SpH [ i •] 


22.3 


45.6 


106.2 


205.5 


16.7 X 10" 


-6 


38.5 X 10-*^ 


88.6 X 10" 


-6 


251.6 X 10" 


-6 


AGH [24] 


232.9 


247.9 


257.4 


268.1 


28.2 X 10" 


'6 


42.2 X lO-f' 


52.2 X 10" 


'6 


155.3 X 10" 


-6 


DSH 


17.4 


29.3 


35.8 


45.9 


0.9 X 10" 


-B 


2.1 X 10-" 


2.8 X 10" 


'B 


4.6 X 10" 


-B 



TABLE 3 

Training and testing time of all algorithms on SIFT1M. 





Training Time (s) 


Test Time (s) 


Method 


L = 16 


L = 32 


L = 64 


L = 96 


L = 16 




L = 32 


L = 64 




L = 96 




LSH [s] 


0.1 


0.3 


0.6 


0.8 


0.4 X 10" 


-B 


1.1 X 10-" 


1.8 X 10" 


-B 


2.4 X 10" 


-B 


KLSH [22] 


10.2 


10.4 


10.8 


11.2 


12.2 X 10" 


-6 


13.1 X 10-6 


13.8 X 10" 


-6 


15.7 X 10" 


-6 


SIKH [3i!] 


0.5 


1.1 


2.3 


3.5 


0.9 X 10" 


-6 


2.3 X 10-6 


6.3 X 10" 


'6 


7.0 X 10" 


-6 


PCAH [34] 


3.9 


6.5 


7.5 


7.8 


0.5 X 10" 


-B 


1.3 X 10-" 


2.0 X 10" 


-6 


2.5 X 10" 


-B 


SpH [35] 


11.4 


28.1 


92.7 


189.1 


11.8 X 10" 


6 


33.3 X 10-6 


77.1 X 10" 


-6 


230.9 X 10" 


-6 


AGH [24] 


135.2 


142.5 


148.1 


155.1 


15.3 X 10" 


-6 


23.9 X 10-6 


31.2 X 10" 


-6 


57.1 X 10" 


-6 


DSH 


8.4 


12.2 


15.5 


20.1 


0.5 X 10" 


B 


1.2 X 10-" 


1.9 X 10" 


-B 


2.6 X 10" 


-B 



increases. Tliis is consistent with previous work [13], 
[33] and is probably because tliat most of the data 
variance is contained in the top few principal directions 
so that the later bits are calculated using the low-variance 
projections, leading to the poorly discriminative codes 
[3" ]. By utilizing the geometric structure of the data to 
guide the projections selection, our DSH successfully 
combines the advantages of both random projection 
based methods and the learning based methods. As a 
result, DSH achieves a satisfied performance on the three 
data sets and almost outperforms its competitors for all 
code lengths. It is interesting to see that the performance 
improvements of DSH over other methods on GISTIM 
and FlickrIM are larger than that on SIFTIM. Since the 
dimensions of the data in GISTIM (960-d) and FlickrIM 
(512-d) are much larger than that in SIFTIM (128-d), this 
suggests that our DSH method are particularly suitable 
for high dimensional situations. Figure 3 presents the 
precision-recall curves of all the algorithms on three data 
sets with the codes of 48 bits and 96 bits. 

Table 1, 2 and 3 show both the training and testing 
time for different algorithms on three data sets, respec- 
tively. We can clearly see that both the training and 



testing time of all the methods decrease as the dimension 
of the data decreases. Considering the training time, the 
three random projection based algorithms are relatively 
efficient, especially for LSH and SIKH. KLSH needs to 
compute a sampled kernel matrix which slows down 
its computation. The three learning based algorithms 
are relatively slow, for exploring the data structure. Our 
DSH is also fast. Although it is slower than the three 
random projection based algorithms, it is significantly 
faster than SpH and AGH. Considering the testing time, 
LSH, PCAH and our DSH are the most efficient methods. 
All of them simply need a matrix multiplication and 
a thresholding to obtain the binary codes. SpH con- 
sumes much longer time than other methods as the code 
length increases since it needs to compute the analytical 
eigenfunctions involving the calculation of trigonometric 
functions. 

4.3 Parameter Selection 

Our DSH has three parameters: p (the number of it- 
erations in fc-means), a (the parameter controlling the 
groups number) and r (the parameter for r-adjacent 
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Fig. 4. The performance of DSH vs. the number of iterations of fc-means (p) at 64 bits. 

TABLE 4 

Training time (s) of DSH vs. the number of iterations of fc-means (p) at 64 bits. 





LSH 

KLSH 

SIKH 

PCAH 

SpH 

- AGH 
-*-DSH 









2 3 4 
P 

(c) SIFTIM 



Data Set 


p = l 


p = 2 


p = 3 


p = 4 


p = 5 


p = 6 


GISTIM 


18.8 


37.2 


56.5 


76.2 


94.1 


111.7 


FlickrlM 


11.7 


23.5 


35.8 


48.1 


62.6 


76.4 


SIFTIM 


4.8 


9.1 


15.5 


21.2 


25.5 


31.9 
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Fig. 5. The performance of DSH vs. the parameter a (controlling the number of groups) at 64 bits. 

TABLE 5 

Training time (s) of DSH vs. the parameter a (controlling the number of groups) at 64 bits. 



Data Set 


a = 0.5 


a = 1.0 


a = 1.5 


a = 2.0 


a = 2.5 


a = 3.0 


GISTIM 


48.4 


52.9 


56.5 


68.4 


74.6 


83.5 


FlickrlM 


21.8 


25.3 


35.8 


46.7 


57.3 


68.2 


SIFTIM 


9.8 


11.2 


15.5 


21.3 


28.2 


37.9 



groups). In this subsection, we discuss how the perfor- 
mance of DSH will be influenced by these three param- 
eters. We learn 64-bits hashing codes and the default 
setting for these parameters is p = 3, a = 1.5 and r = 3. 
When we study the impact of one parameter, the other 
parameters are fixed as the default. 

Figure 4 and Table 4 show how the performance 
of DSH varies as the number of iterations in fc-means 
varies. As the number of iterations increases, it is rea- 
sonable to see that both the MAP and the learning time 



of DSH increase. On all the three data sets, 3 iterations 
in fc-means are enough for achieving reasonably good 
MAP. 

Figure 5 and Table 5 show how the performance of 
DSH varies as a changes (the groups number generated 
by fc-means changes). As we can see, as a becomes 
larger (the groups number increases), both the MAP and 
learning time of DSH increase. Setting a = 1.5 is a 
reasonable balance considering both the accuracy and 
the efficiency. 
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Fig. 6. The performance of DSH vs. the parameter r (for r-adjacent groups) at 64 bits. 



Figure 6 shows the performance of DSFi varies as r 
(r-adjacent groups) changes. DSH achieves stable and 
consistent good performance as r is less than 5. As r 
becomes larger, DSFi generates more projections which 
are used to separate two far away groups. These projec- 
tions are usually less critical and redimdant. Thus, the 
performance of DSH decreases. 

5 Conclusion 

In this paper, we have developed a novel hashing algo- 
rithm, called Density Sensitive Hashing (DSH), for high 
dimensional nearest neighbors search. Different from 
those random projection based hashing approaches, e.g., 
Locality Sensitive Hashing, DSH uses the geometric 
structure of the data to guide the projections selection. 
As a result, DSH can generate hashing codes with more 
discriminating power. Empirical studies on three large 
data sets show that the proposed algorithm scales well 
to data size and significantly outperforms the state-of- 
the-art hashing methods in terms of retrieval accuracy. 
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